Meta Llama 3.3 70B Instruct - Qubrid Documentation

About the Provider

Meta is a leading global technology company focusing on social media, connectivity, and artificial intelligence research. Meta develops advanced AI models, such as the LLaMA family, to empower developers and enterprises with scalable language understanding and generation capabilities. Its open-weight AI initiatives aim to foster innovation and broader community access to powerful AI tools.

Model Quickstart

This section helps you quickly get started with the meta-llama/Llama-3.3-70B-Instruct model on the Qubrid AI inferencing platform. To use this model, you need:

A valid Qubrid API key
Access to the Qubrid inference API
Basic knowledge of making API requests in your preferred language

Once authenticated with your API key, you can send inference requests to the meta-llama/Llama-3.3-70B-Instruct model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
  base_url="https://platform.qubrid.com/v1",
  api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
  model="meta-llama/Llama-3.3-70B-Instruct",
  messages=[
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  max_tokens=4096,
  temperature=0.7,
  top_p=0.9,
  stream=True
)

# If stream = False comment this out
for chunk in stream:
  if chunk.choices and chunk.choices[0].delta.content:
      print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

Model Overview

Llama 3.3 70B Instruct is a 70B-parameter open-weight large language model from Meta, optimized for instruction following, complex reasoning, and multi-turn conversations.It is well suited for enterprise use cases such as advanced chat assistants, code reasoning, and long-document analysis with large context windows.

Model at a Glance

Feature	Details
Model ID	Llama-3.3-70B-Instruct
Architecture	Transformer with Grouped-Query Attention(GQA)
Model Size	70B parameters
Parameters	4
Training Data	Publicly available web data (multilingual)
Context Length	128K Token

Supported languages

English
German
French
Italian
Portuguese
Hindi
Spanish
Thai

When to use?

Use Llama 3.3 70B Instruct if you need:

Enterprise chat assistants
Advanced code generation and review
Long-document question answering
Summarization at scale
Retrieval-Augmented Generation (RAG)
AI agents and workflow automation

Inference Parameters

Parameter Name	Type	Default	Description
Streaming	boolean	true	Enable streaming responses for real-time output.
Temperature	number	0.7	Controls randomness. Higher values mean more creative but less predictable output.
Max Tokens	number	4096	Defines the maximum number of tokens the model is allowed to generate.
Top P	number	0.9	Nucleus sampling that limits token selection to a subset of top probability mass.

Key Features

High-quality reasoning and instruction adherence
Strong performance on code and analytical tasks
Large context window for long-document processing
Open-weight model suitable for private and on-prem deployments
Production-ready for enterprise workloads

Limitations

Smaller context window compared to largest models
Can struggle with highly complex, multi-step reasoning

Summary

Meta Llama 3.3 70B Instruct is a 70B-parameter, instruction-tuned large language model designed for high-quality reasoning and multi-turn conversational tasks. It is well suited for enterprise workloads such as advanced chat assistants, code generation, summarization, and long-document question answering. The model supports a large context window, enabling effective processing of lengthy inputs and retrieval-augmented generation workflows.

Documentation Index

​About the Provider

​Model Quickstart

​Model Overview

​Model at a Glance

​Supported languages

​When to use?

​Inference Parameters

​Key Features

​Limitations

​Summary